o 



X 



A Time-Efficient Quantum Walk for 
3-Distinctness Using Nested Updates* 

Andrew M. Childs^^'^, Stacey Jeffery-'-^'^, Robin Kothari^^'^, and Frederic Magniez 



14 

£^ , ^Department of Combinatorics & Optimization, University of Waterloo, Canada 

' David R. Cheriton School of Computer Science, University of Waterloo, Canada 

CSl . '^Institute for Quantum Computing, University of Waterloo, Canada 

• ^CNRS, LIAFA, Univ Paris Diderot, Sorbonne Paris-Cite, France 

<D 

oo 

Abstract 

^ We present an extension to the quantum walk search framework that facilitates quantum 

walks with nested updates. We apply it to give a quantum walk algorithm for 3-Distinctness 
(~| I with query complexity 0(n^/''), matching the best known upper bound (obtained via learning 

■ graphs) up to log factors. Furthermore, our algorithm has time complexity 0(n^/^), improving 
^ the previous 0{n'^^'^). 

1 Introduction 

> 

. Element Distinctness is a basic computational problem. Given a sequence x = Xii---;Xn of n 

I integers, the task is to decide if those elements are pairwise distinct. This problem is closely related 

■ to Collision, a fundamental problem in cryptanalysis. Given a 2-to-l function / : [n] — )• [n], the 
CN . aim is to find a ^ b such that /(a) = f{b). One of the best (classical and quantum) algorithms is 

to run Element Distinctness on / restricted to a random subset of size ^/n. 

In the quantum setting, Element Distinctness has received a lot of attention. The first non- 
trivial algorithm used 0(n^/^) time |BDH^05] . The optimal 0(n^/^) algorithm is due to Ambai- 
nis |Amb04] . who introduced an approach based on quantum walks that has become a major tool 
for quantum query algorithms. The optimality of this algorithm follows from a query lower bound 

■ for Collision |AS04) . In the query model, access to the input x is provided by an oracle whose an- 
swer to query i S [n] is Xi- This model is the quantum analog of classical decision tree complexity: 
the only resource measured is the number of queries to the input. 

Quantum query complexity has been a very successful model for studying the power of quantum 
computation. In particular, quantum query complexity has been exactly characterized in terms of 



'Support for this work was provided by NSERC, the Ontario Ministry of Research and Innovation, the US ARO, 
the French ANR Blanc project ANR-12-BS02-005 (RDAM), and the European Commission 1ST STREP project 
25596 (QCS). 

^ amchilds@uwaterloo.ca 

'sjefTeryQuwaterloo.ca 

^rkothariQcs.uwaterloo.ca 

^frederic. magniez@univ-paris-diderot.fr 



1 



a semidefinite program, the general adversary bound [ReilH ILMR"*" 11 . To design quantum query 



algorithms, it suffices to exhibit a solution to this semidefinite program. However, this turns out to 
be difficult in general, as the minimization form of the general adversary bound has exponentially 
many constraints. Belovs |Bell2b| recently introduced the model of learning graphs, which can be 
viewed as the minimization form of the general adversary bound with additional structure imposed 
on the form of the solution. This additional structure makes learning graphs much easier to reason 
about. The learning graph model has already been used to improve the query complexity of many 
graph problems |Bell2bl lLMSTT| ILMS13] as well as /c-Distinctness |Bell2a| . 

One shortcoming of learning graphs is that these upper bounds do not lead explicitly to efficient 
algorithms in terms of time complexity. Although the study of query complexity is interesting on 
its own, it is relevant in practice only when a query lower bound is close to the best known time 
complexity. 

Recently, [JKM13] reproduced several known learning graph upper bounds via explicit algo- 
rithms in an extension of the quantum walk search framework of [M NRSlT] . This work produced 
a new quantum algorithmic tool, quantum walks with nested checking. Algorithms constructed 
in the framework of |JKM13j can be interpreted as quantum analogs of randomized algorithms, 
so they are simple to design and analyze for any notion of cost, including time as well as query 
complexity. This framework has interpreted all known learning graphs as quantum walks, except 
the very recent adaptive learning graphs for /c-Distinctness [ Bell2aj . 

In ^-Distinctness, the problem is to decide if there are k copies of the same element in the input, 
with k = 2 being Element Distinctness. The best lower bound for /c-Distinctness is the Element 
Distinctness lower bound Q{n'^^^), whereas the best query upper bound is 0{n^^'^ Z*-^ ~^^) = 
o(n^/^) jBell2aj . achieved using learning graphs, improving the previous bound of 0(n^/(^+i)) 
[Amb04| . However, the best known time complexity remained d{n^/^^^^'^). We improve this upper 
bound for the case when /c = 3. 

Our algorithm for 3-Distinctness is conceptually simple: we walk on sets of 2-collisions and 
look for a set containing a 2-collision that is part of a 3-collision. We check if a set has this 
property by searching for an index that evaluates to the same value as one of the 2-collisions in the 
set. However, to move to a new set of 2-collisions, we need to use a quantum walk subroutine for 
finding 2-collisions as part of our update step. This simple idea is surprisingly difficult to implement 
and leads us to develop a new extension of the quantum walk search framework. 

Given a Markov chain P with spectral gap 6 and success probability e in its stationary dis- 
tribution, one can construct a quantum search algorithm with cost S -|- ;^(-^U -|- C) [MNRSlT] . 
where S, U and C are respectively the setup, update, and checking costs of the quantum analog 
of P. Using a quantum walk algorithm with costs S', U', C, e', 5' (as in [MNRSlT] ) as a checking 
subroutine straightforwardly gives complexity S -|- -^(-^U + S' + "^("^U' -|- C')). Using nested 

checking [JKM13] . the cost can be reduced to S + S' + ^(^U + -^(-^U' + C)). 

It is natural to ask if a quantum walk subroutine can be used for the update step in a similar 
manner to obtain cost S -|- S' + --i=(-ij--^(--^U' -|- C) + C). In most applications, the underlying 
walk is independent of the input, so the update operation is simple, but for some applications a 
more complex update may be useful (as in [CK11| . where Grover search is used for the update). In 
Section 12.31 we describe an example showing that it is not even clear how to use a nested quantum 
walk for the update with the seemingly trivial cost S+-^(-^(S'+-^(-^U'+C'))+C). Nevertheless, 
despite the difficulties that arise in implementing nested updates, we show in Section 13.21 how to 
achieve the more desirable cost expression in certain cases, and a similar one in general. 
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To accomplish this, we extend the quantum walk search framework by introducing the concept of 
coin- dependent data. This allows us to implement nested updates, with a quantum walk subroutine 
to carrying out the update procedure. Superficially, our modification appears small. Indeed, the 
proof of the complexity of our framework is nearly the same as that of IMNRSllj . However, there 
are some subtle differences in the implementation of the walk. 

As in |JKM13j . this concept is simple yet powerful. We demonstrate this by constructing a 
quantum walk version of the learning graph for 3-Distinctness with matching query complexity (up 
to poly- logarithmic factors). Because quantum walks are easy to analyze, the time complexity, 
which matches the query complexity, follows easily, answering an open problem of |Bell2a] . 

Independently, Belovs [Bell3] also recently obtained a time-efficient implementation of his learn- 
ing graph for 3-Distinctness. His approach also uses quantum walks, but beyond this similarity, 
the algorithm appears quite different. In particular, it is based on another framework of search via 
quantum walk due to Szegedy |Sze04l IMNRS12] , whereas our approach uses a new extension of the 
quantum walk search framework of [MNRSlT] . 

2 Preliminaries and Motivation 
2.1 Quantum Walks 

Consider a reversible, ergodic Markov chain P on a connected, undirected graph G = {X, E) with 
spectral gap (5 > and stationary distribution vr. Let M C X be a set of marked vertices. Our goal 
is to detect wether M = or Pr^^7T-(x G A^) ^ 6, for some given £^ > 0. Consider the following 
randomized algorithm that finds a marked element with bounded error. 

1. Sample x from tt 

2. Repeat for 0(l/e) steps 

(a) If the current vertex x is marked, then stop and output x 

(b) Otherwise, simulate 0(1/(5) steps of P starting with x 

3. If the algorithm has not terminated, output 'no marked element' 

This algorithm has been quantized by [M NRSlT] leading to efficient quantum query algorithms. 
Since each step has to be unitary and therefore reversible, we have to implement the walk carefully. 
The quantization considers P as a walk on edges of E. We write (x, y) £ E when we consider 
an edge {x,y} £ E with orientation {x,y). The notation {x,y) intuitively means that the current 
vertex of the walk is x and the coin, indicating the next move, is y. Swapping x and y changes the 
current vertex to y; then the coin becomes x. 

The quantum algorithm may carry some data structure while walking on G; we formalize this 
as follows. Let be a state outside X. Define D : X L) {0} — t- T> for some Hilbert space D, with 
\D{0)) = |0). We define costs associated with the main steps of the algorithm. By cost we mean 
any measure of complexity such as query, time or space. 

Setup cost: Let S be the cost of constructing 

Itt) = \D{x)) Vn^)\y) \D{y)) . 
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Update cost: Let U be the cost of the Local Diffusion operation, which is controlled on the 
first two register^ and acts as 



x) \Dix)) |0) \Dm ^ \x) \D{x)) VPi^\y) \D{y)) . 



y€X 



Checking cost: Let C be the cost of the reflection 



x) \D{x)) H> 



-\x)\D{x)) if xEM 
x) \D{x)) otherwise. 



Theorem 2.1 ( [MNRSllj l. Let P be a reversible, ergodic Markov chain on G = {X,E) with 
spectral gap 6 > 0. Let M Q X be such that Frxr^-K{x G M) > e, for some e > 0, whenever M ^ 0. 
Then there is a quantum algorithm that finds an element of M , if M ^ 0, with bounded error and 
with cost 



Furthermore, we can approximately map |7r) to |7r(M)), the normalized projection of |7r) onto 
span{|2;) \D{x)) \y) \D{y)) : x £ M,y £ X}, in cost ^(^U + C). 

2.2 3-Distinctness 

We suppose that the input is a sequence x = Xi-, - ■ ■ iXn of integers from [q] := {1, . . . We 
model the input as an oracle whose answer to query i G [n] is Xi- 

We make the simplifying assumptions that there is at most one 3-collision and that the number 
of 2-collisions is in 0(n). The first assumption is justified in |Amb041 Section 5]. To justify the 
second assumption, note that given an input x ^ we can construct x' £ + ^1^" with the 
same 3-collisions as x-, ^'^^ ^{n) 2-collisions, by defining x'i = Xi for ^ £ N aiid x'i = x'i+n = Q + i 
for i E {n + 1, . . . , 2n}. Note that any two 2-collisions not both part of the 3-collision are disjoint. 

A common simplifying technique is to randomly partition the space [n] and assume that the 
solution respects the partition in some sense. Here we partition the space into three disjoint sets 
of equal size, Ai, and ^3, and assume that if there is a 3-collision {i, j, /c}, then we have i G ^1, 
2 G A2 and k ^ Aj,. This assumption holds with constant probability, so we need only repeat the 
algorithm 0(1) times with independent choices of the tripartition to find any 3-collision with high 
probability. Thus, we assume we have such a partition. 

2.3 Motivating Example 

Quantum Walk for Element Distinctness. In the groundbreaking work of Ambainis |Amb04] . 
which inspired a series of quantum walk frameworks |Sze041 IMNRSlTj IJKM13) leading up to this 
work, a quantum walk for solving Element Distinctness was presented. This walk takes place on 
a Johnson graph, J(n,r), whose vertices are subsets of [n] of size r, denoted (t^l). In J(n, r), two 

^The requirement that this operation be controlled on the first two registers, i.e., that it always leaves the first two 
registers unchanged, is not explicitly stated in [MNRSlT] . However, this condition is needed to prevent, for example, 
the action |a;) |?/)) |0, 0) 1— >■ \x) \D(x)) |</>), where {^\D(x)) = and \x) \D{x)) \4)) is a possible state of the algorithm. 
In this case, (LOCAL DiFFUSiON)ref|o,o) (LOCAL Diffusion)''' would not act as W(P) on \x) \D{x)) \(f>). 
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vertices S, S' are adjacent if |5 R S"| = r — 1. The data function is D{S) = {{i, Xi) ■ i £ S}. The 
diffusion step of this walk acts as 

\S) \D{S)) |0) ^ \S) \D{S)) -7=i=^ T.^es,e[n]\s I \ U j) \D{{S \ i) U j)) . 

We can perform this diffusion in two queries by performing the transformation 

\S) \DiS)) |0) ^ \S) \D{S)) ^ Z^es\ii^ X^)) ^ E,-e[n]\5 X,)) • 

We can reversibly map this to the desired state with no queries, and by using an appropriate 
encoding of D, we can make this time efficient as weU. 

To complete the description of this algorithm, we describe the marked set and checking proce- 
dure. We deviate slightly from the usual quantum walk algorithm of |Amb04| and instead describe 
a variation that is analogous to the learning graph for Element Distinctness |Bell2b) . We say a 
vertex S is marked if it contains an index i such that there exists j E [f^] \{i} with Xi = Xj (whereas 
in |Amb04j both i and j must be in S). To check if S is marked, we simply search over [n] \ S for 
such a j, in cost 0{^/n). This does not give asymptotically better performance than |Amb04j . but 
it is more analogous to the 3-Distinctness algorithm we attempt to construct in the remainder of 
this section, and then succeed in constructing in Section HI 

Attempting a Quantum Walk for 3-Distinctness. We now attempt to construct an anal- 
ogous algorithm for 3-Distinctness. Conceptually, the approach is simple, but successfully imple- 
menting the simple idea is nontrivial. The idea is to walk on a Johnson graph of sets of collision 
pairs, analogous to the set of queried indices in the Element Distinctness walk described above. 
The checking step is then similar to that of the above walk: simply search for a third element that 
forms a 3-collision with one of the 2-collisions in the set. For the update step, we need to replace 
one of the collision pairs in the set using a subroutine that finds a 2-collision. We now describe 
the difficulty of implementing this step efficiently, despite having an optimal Element Distinctness 
algorithm at our disposal. Section [3] presents a framework that allows us to successfully implement 
the idea in Section [H 

Let V denote the set of collision pairs in the input, and n2 = \V\. We walk on J(n2,S2)) with 
each vertex 52 corresponding to a set of ,32 collision pairs. The diffusion for this walk is the map 

\S2,D{S2)) |0) ^ \S2,D{S2)) (M')e5. 1(^2 \ U (j,/)) \D{{S2 \ (i,^')) U (i,/))> • 

To accomplish this, we need to generate ^ E(i,i')e52 I Xi)) and ^7=^ T.(j,j')(iV\S2 I (•?'' ^i)) • 
The first superposition is easy to generate, since we have S'2, but the second is more difficult since 
we have to find new collisions. 

The obvious approach is to use the quantum walk algorithm for Element Distinctness as a 
subroutine. However, this algorithm does not return the desired superposition over collisions; 
rather, it returns a superposition over sets that contain a collision. That is, we have the state 
:^Z](i,i')eP l(^'^''Xj)) IV'(^,«')) for some garbage \^{i,i')). The garbage may be only slightly en- 
tangled with {i,i'), but even this small amount of error in the state is prohibitive. Since we must 
call the update subroutine many times, we need the error to be very small. Unlike for nested 
checking, where bounded-error subroutines are sufficient, we cannot amplify the success probabil- 
ity of an update operator. We cannot directly use the state returned by the Element Distinctness 
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algorithm for several reasons. First, we cannot append garbage each time we update, as this would 
prevent proper interference in the walk. Second, when we use a nested walk for the update step, 
we would like to use the same trick as in nested checking: putting a copy of the starting state for 
the nested walk in the data structure so that we only need to perform the inner setup once. To do 
the same here, we would need to preserve the inner walk starting state; in other words, the update 

— 1 /2 

would need to output some state close to ("J ^^^^(^["l) l^i)- While we might try to recycle 

the garbage to produce this state, it is unclear how to extract the part we need for the update 
coherently, let alone without damaging the rest of the state. 

This appears to be a problem for any approach that directly uses a quantum walk for the update, 
since all known quantum walks use some variant of a Johnson graph. Our modified framework 
circumvents this issue by allowing us to do the update with some garbage, which we then uncompute. 
This lets us use a quantum walk subroutine, with setup performed only at the beginning of the 
algorithm, to accomplish the update step. More generally, using our modified framework, we can 
tolerate updates that have garbage for any reason, whether the garbage is the result of the update 
being implemented by a quantum walk, or by some other quantum subroutine. 

3 Quantum Walks with Nested Updates 
3.1 Coin-Dependent Data 

A quantum analog of a discrete-time random walk on a graph can be constructed as a unitary 
process on the directed edges. For an edge {x, y}, we may have a state \x) \y), where \x) represents 
the current vertex and \y) represents the coin or next vertex. In the framework of [MNRSlT] . some 
data function on the vertices is employed to help implement the search algorithm. We modify the 
quantum walk framework to allow this data to depend on both the current vertex and the coin, so 
that it is a function of the directed edges, which seems natural in hindsight. We show that this 
point of view has algorithmic applications. In particular, this modification enables efficient nested 
updates. 

In the rest of the paper, let P be a reversible, ergodic Markov chain on a connected, undirected 
graph G = {X, E) with stationary distribution vr and spectral gap 5 > 0. 

Let ^ X. Let D : {X x {0}) U E ^ V for some Hilbert space V. A quantum analog of P with 
coin-dependent data structures can be implemented using three operations, as in [M NRSllj . but 
the update now has three parts. The first corresponds to Local Diffusion from the framework 
of [MNRSll] . as described in Section [2Tl The others are needed because of the new coin-dependent 
data. 

Update cost: Let U be the cost of implementing 

• Local Diffusion: |x,0) \D{x,0)) ^ Y^y^x \/P{^^y) k^y) \D{x,y)) M x e X; 

• The (X,0)-Phase Flip: |x,0) \D{x,0)) ^ - \x,0) \D{x,0)) Vx G X, and the identity 
on the orthogonal subspace; and 

• The Database Swap: \x,y) \D{x,y)) ^ \y,x) \D{y,x)) V(x,y) G E. 

By cost, we mean any desired measure of complexity such as queries, time, or space. We also 
naturally extend the setup and checking costs as follows, where M C X is a set of marked vertices. 
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Setup cost: Let S be the cost of constructing 



K) := J2 \/^(^) X] \x,y) \D{x,y)) . 

Checking cost: Let C be the cost of the reflection 

I \inr j-\x^y)\D{x,y)) ifxeM 

\x,y) \D[x,y)) 1-^ < V(x,i/JGii. 
I |x, y) [/^(x, y)) otherwise, 

Observe that iTr)*^ := J2xex V ^(^) 1^' ^) l-^(^iO)) "^^^^ be mapped to |7r) by the Local Diffu- 
sion, which has cost U < S, so we can also consider S to be the cost of constructing Ivr)*^. 

Theorem 3.1. Let P be a Markov chain on G = {X,E) with spectral gap 5 > 0, and let D be a 
coin-dependent data structure for P. Let M <Z X satisfy Pvx^nix G M) > e > whenever M ^ 0. 
Then there is a quantum algorithm that finds an element of M , if M ^ 0, with bounded error and 
with cost 

Proof. Our quantum walk algorithm is nearly identical to that of [MNRSlT] . so the proof of this 
theorem is also very similar. Just as in [MNRSlT] . we define a walk operator, W{P), and analyze 
its spectral properties. Let A := spanj^^^ ^y P{x, y) |x, y) \D{x,y)) : x G X} and define W{P) := 
((Database Swap) • ref^)^, where ref^ denotes the refiection about A. 

As in [MNRSllj . we can define Ti := span{|x,y) : {x,y) e {X x {0}) U E} and Ud ■= 
span{|x, y, D(x, y)) : (x,y) G {X x {0}) U E}. As in [MNRSll], there is a natural isomorphism 
|x, y) I—)- |x, y) = |x, y, D{x, y)), and is invariant under both W{P) and the checking operation. 
Thus, the spectral analysis may be done in 7i, on states without data, exactly as in [MNRSllj . 
However, there are some slight differences in how we implement W{P), which we now discuss. 

The first difference is easy to see: in [MNRSllj . the Database Swap can be accomplished 
trivially by a SWAP operation, mapping \x) \y) \D{x)) \D{y)) to \y) |x) \D{y)) \D{x)), whereas in 
our case, there may be a nontrivial cost associated with the mapping \D(x,y)) 1— )• \D{y,x)), which 
we must include in the calculation of the update cost. 

The second difference is more subtle. In [MNRSlT] . ref^ is implemented by applying 
(Local Diffusion)^, reflecting about |0,-D(0)) (since the data only refers to a vertex) in the 
coin register, and then applying (Local Diffusion). It is simple to reflect about |0,L'(0)), since 
\D(0)) = |0) in the formalism of [MNRSllj . In [MNRSllj . this reflection is sufficient, because the 
operation (Local Diffusion)^ fixes the vertex and its data, |x) \D{x)), so in particular, it is still 
in the space span{|x) \D{x)) : x G X}. The register containing the coin and its data, \y) \D{y)), 
may be moved out of this space by (Local Diffusion)^, so we must reflect about |0) \D{0)), but 
this is straightforward. 

With coin-dependent data, a single register \D{x,0)) holds the data for both the vertex and its 
coin, and the operation (LOCAL Diffusion)^^ may take the coin as well as the entire data register 
out of the space Tin, so we need to reflect about |0) |-D(x,0)), which is not necessarily deflned to 
be |0) |0). This explains why the cost of (X, 0)-Phase Flip is also part of the update cost. In 
summary, we implement VF(P) by ((Database Swap)-(Local Diffusion)-((X,0)— Phase Flip)- 
(Local Diffusion)"'")^. □ 
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3.2 Nested Updates 



We show how to implement efficient nested updates using the coin-dependent data framework. Let 

C : X U {0} — 7- C be some coin-independent data structure (that will be a part of the final data 
structure) with |C(0)) = |0), where we can reflect about span{|x) \C{x)) : x £ M} in cost Cc*. In 
the motivating example, if a; = ^2 is a set of collision pairs, then C{S2) stores their query values. 

Fix X e X. Let be a walk on a graph = {V^,E^) with stationary distribution tt^ and 
marked set C V^. We use this walk to perform Local Diffusion over \x). Let be the data 
for this walk. 

When there is ambiguity, we specify the data structure with a subscript. For instance, jTr)^, = 
T,x,yex \/'^{x)P{x,y) \x,y) \D{x,y)) and |7r)^ = Exex \Ar(^) |x, 0) \C{x),Q). Similarly, Sc is the 
cost to construct the state \t^)q- 

Definition 3.2. The family {P^ , ,d^)x^x implements the Local Diffusion and Database 
Swap of (P, C) with cost T if the following two maps can be implemented with cost T; 

Local Diffusion with Garbage.- For some garbage states {\tp{x,y)))^^ ^^^g, an operation con- 
trolled on the vertex x and C{x), acting as 

\x, 0) \Cix),0) \7r-iM-))a. ^ Yl ^/Pi^ 1^' y) C{y)) mx, y)) ; 

vex 

Garbage Swap; For any edge {x,y) G E, 

\x,y) \C{x),C{y)) \^l;{x,y)) ^ \y,x) \C{y),Cix)) \^l;{y,x)) . 

The data structure of the implementation is \D{x,0)) = |C(x),0) |7r^(M^))^a: for all x e X and 
\D{x,y)) = \C{x),C{y)) \ilj{x,y)) for any edge {x,y) G E. 

Theorem 3.3. Let P be a reversible, ergodic Markov chain on G = {X, E) with spectral gap S > 0, 

and let C be a data structure for P. Let M <^ X be such that Prxr^wix G M) > e for some e > 
whenever M / 0. Let {P^,M^,d^)xex be a family implementing the Local Diffusion and 
Database Swap of (P, C) with cost T, and letS' ,C ,1/e' ,1/6' be upper bounds on the costs and 
parameters associated with each of the {P^,M^,d^). Then there is a quantum algorithm that finds 
an element of M, if M ^ 0, with bounded error and with cost 

Proof. We achieve this upper bound using the quantization of P with the data structure of the 
implementation, D. We must compute the cost of the setup, update, and checking operations 
associated with this walk. 

Checking: The checking cost C = C/j is the cost to reflect about span{|.x) \y) \ D{x, y)) : x G M} = 
span{|a;) \y) \C{x), C{y)) \4>{x, ?/)) : x G M}. We can implement this in "H^j by reflecting about 
span{|x) \C{x)) : x G M}, which costs Cc- 

Setup: Recall that |C(0)) = |0). The setup cost 5 = 5/) is the cost of constructing the state 
E.ex V^k) |0) 1^(^,0)) = E.ex V^k) |0) \C{x),0) |7r-(M-)) . 
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We do this as follows. We first construct Z^^^gx \x,Q) |C(x),0) in cost Sc- Next, 

we apply the mapping \x) i— t- |x) \ti^) in cost S'. Finally, we use the quantization of to 
perform the mapping |x) \tt^) ^ \x) |7r^(M^)) in cost -^(-^U' + C). The full setup cost is 

thenS = Sc + S' + ^(^U' + C). 

Update: The update cost has three contributions. The first is the Local Diffusion operation, 
which, by the definition of D, is exactly the Local Diffusion with Garbage operation. 
Similarly, the Database Swap is exactly the Garbage Swap, so these two operations 
have total cost T. The (X, 0)-Phase Flip is simply a reflection about states of the form 
\x) |Z)(x,0)) = \x) \C{x)) \-n^{M^)). Given any x G X, we can reflect about \t:^{M^)) using 

the quantization of in cost -j=(-j=\j' + C) by running the algorithm of Theorem 13.11 In 

V e V o' 

particular, we can run the walk backward to prepare the state |7r^), perform phase estimation 
on the walk operator to implement the reflection about this state, and then run the walk 
forward to recover \ti^{M^)). However, this transformation is implemented approximately. 
To keep the overall error small, we need an accuracy of 0(l/\/ e5e'5'), which leads to an 
overhead logarithmic in the required accuracy. The reflection about \tt^{M^)), controlled on 
\x), is sufficient because Local Diffusion with Garbage is controlled on \x) |C(x)), and so 
it leaves these registers unchanged. Since we apply the (X, 0)-Phase Flip just after applying 
(Local Diffusion)^ (see proof of Theorem 13. ip to a state in Hd, we can guarantee that 
these registers contain \x) \C{x)) for some x ^ X. The total update cost (up to log factors) 
isU = T+^(^U' + C'). 

Finally, the full cost of the quantization of P (up to log factors) is 



If T = (as may be the case, e.g., when the notion of cost is query complexity), then the 
expression is exactly what we would have liked for nested updates. 

4 Application: Quantum Query Complexity of 3-Distinctness 

In this section we prove the following theorem. 

Theorem 4.1. The quantum query complexity of 2>- Distinctness is 0{n^^'^). 

We begin by giving a high-level description of the quantum walk algorithm before describing 
the implementation of each required procedure and their costs. First we define some notation. 

For any set 5i C Ai U A2, let P(S'i) := {(?, j) £ Ai x A2 : i, j £ Si,i ^ j, Xi = Xj} be the set of 
2-collisions in 5i and for any set S2 C AiX A2, let 1(82) '■= U(i j)eS2'f^' -^^ indices that 

are part of pairs in ^2. In general, we only consider 2-collisions in A1XA2; other 2-collisions in x are 
ignored. For any pair of sets A,B, let V[A,B) := {(i,j) £ A x B : i ^ j^Xi = Xj} be the set of 2- 
collisions between A and B. For convenience, we define V := V{Ai, A2) Let n2 ■= \V\ be the size of 
this set. For any set 52 C "P, we denote the set of queried values by (5(52) := {{i,j,Xi) ■ ihj) £ 82}- 
Similarly, for any set 5i C [n], we denote the set of queried values by Q(5i) := {(i, Xi) ■ i £ Si}. 
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4.1 High-Level Description of the Walk 



The Walk Our overall strategy is to find a 2-collision £ Ai x A2 such that 3k S with 
{i,j,k} a 3-collision. Let si,S2 < n be parameters to be optimized. We walk on the vertices 
X = (^), with each vertex corresponding to a set of S2 2-collisions from Ai x A2. A vertex 
is considered marked if it contains such that 3k € A^ with {i,j,k} a 3-collision. Thus, if 

M 7^ 0, the proportion of marked vertices is e = O(^). 

To perform an update, we use an Element Distinctness subroutine that walks on si-sized subsets 
of Ai L) A2. However, since 712 is large by assumption, the expected number of collisions in a set of 
size si is large if si ^ ^/n, which we suppose holds. It would be a waste to take only one and leave 

the rest, so we replace multiple elements of ^2 in each step. This motivates using a generalized 

2 2 

Johnson graph J{n2, S2,m) for the main walk, where we set m := = 0{^), the expected 
number of 2-collisions in a set of size si. In J{n2, S2,Tn), two vertices 5*2 and 5*2 are adjacent if 
|5'2 n ^2! = S2 — m, so we can move from S2 to S2 by replacing m elements of 5*2 by m distinct 
elements. Let r(52) denote the set of vertices adjacent to ^2. The spectral gap of J{n2, S2,m) is 

The Update To perform an update step on the vertex S2, we use the Element Distinctness 
algorithm of [Amb04] as a subroutine, with some difference in how we define the marked set. 
Specifically, we use the subroutine to look for m 2-collisions, with m ^ 1. Furthermore, we only 
want to find 2-collisions that are not already in 5*2, so P^^ is a walk on J(2n/3 — 2s2,si), with 
vertices corresponding to sets of si indices from (^1 L)A2) \I{S2), and we consider a vertex marked 
if it contains at least m pairs of indices that are 2-collisions (i.e., M^^ = {Si G ^(^iU^2)\^{>S'2)"j . 
\r{Si)\>m}). 



The Data We store the value Xi with each E 52 and i E ^i, i.e., \C{S2)) = \QiS2)) and 
\d^^{Si, S[)) = \Q{Si),Q{S[)) . Although technically this is part of the data, it is classical and 
coin-independent, so it is straightforward. Furthermore, since Si is encoded in Q{Si) and S2 in 
Q{S2), we simply write \Q{Si)) instead of \Si,Q{Si)) and \Q{S2)) instead of \S2,Q{S2))- 

The rest of the data is what is actually interesting. We use the state \-k^'^ (M^^)'j^^s2 ™ 
following instead of ^tt^^ {M^'^)^ since it is easy to map between these two states. For every 
52 E X, let 



\D{S2,0)) := |Q(52),0) |^^HM^^))°S2 = \Q{S2)) 



1 



Si€M^2 

and for every edge (52,5^), let |£'(52,5^)) := |Q(52), g(5^)) |^(52,5^)) where 



\HS2,S'2)):= 

^ ((AiUA2)\X(S2US'2)\ 
'^l^V Si -2m J 



\ 



(n2-S2\ 
\ m J 



(l-PiSi 



QiSi 



(1) 



We define in this way precisely because it is what naturally occurs when we attempt to perform 
the diffusion. 
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4.2 Implementation and Cost Analysis 

We now explain how to implement the walk described at the beginning of this section and analyze 
the costs of the associated operations. 

We have assumed that we have some partition Ai,A2, A3 of [n], although we actually want to run 
our algorithm on a random partition. The starting state is a uniform superposition over S2 collision 
pairs across the bipartition Ai x A2. Unfortunately, given Ai,A2, we are unable to construct a 
valid starting state. However, we can generate a state-partition pair (|7r(^i, ^2)) 1 Ai, A2) such that 
the distribution of Ai,A2, A3 := [n] \ (^i U A2) is sufficiently random, and |7r(Ai, ^2)) is a starting 
state for the partition Ai, A2, A3. 

Theorem 4.2 (Outer walk setup cost Sc). The starting state of the outer walk, 
("2) ^''^ Ss2g('''*'^1'^2)) IQ('^2)); can be constructed for random variables ^i, 7^2, A3 with \Ai\ = 

1^2! = {A^l = n/3, such that ifx has a unique 3 -collision {i, j, k} , thenPi{{i,j,k) G ^41X^2X^3) = 
in 0{si + S2^Jn/si) queries. 

Proof. To begin, we choose a random tripartition Ai,A2,A3 of [n] such that |yli| = ^ + si — S2. 
IA2I = §, and l^lsl = I — si + S2. Our final sets ^1,^2, A3 are closely related to these sets, 
but satsify I All = IA2I = IA3I = Let n2 be the number of 2-collisions across Ai x A2. We 
first create a uniform superposition over all subsets of Ai of size si along with their query values, 

r/'+/r'')"'^'5:,,(A,) IQW), using 0{s,) queries. 

For a set 7 G {f^l), let H{I) C A2 denote the set {j G A2 : 3? G /, Xi = Xj} of indices in 
A2 colliding with I. Next we repeatedly Grover search for indices in H{I). For a uniform I, the 

size of H{I) is roughly = Jl(si) in expectation; more specifically, for most choices Ai and 
A2, wc have Pti{\H{I)\ G r2(si)) > 1 — o(l). We can therefore consider only the part of the state 
^n/3+si-s2'j I ^^^^^^^ i^^^^i^^^ \Q{1))., for a suitable constant e. Thus, we can use Grover search 

to find and query S2 elements of in 0{s2yri/si) queries, obtaining a state close to 

' /e(f;)^l«mi>.« ^ ' ' MT) 

For a given J, we can partition the set / into two disjoint sets: h, which contains all elements 
in I that do not collide with any element in J; and I2, which contains elements that do collide with 
an element in J. We can then combine I2 with J to get a set of S2 collision pairs. The full reversible 
mapping, which costs queries, is \Q{I), Q{J)) ^ \Q{Ii)) \{{h3-, Xi) '■ i ^ j & J})- Applying this 
transformation gives (a state close to) 

he{^f_\J:\H{h)\>esi-S2 S2eV{Ai\h,A2) 

Note that this state is not uniform in Ii, but is uniform in ^2 when we restrict to a particular 7i. 
Thus we measure the first register to get some Ii with non-uniform probability that depends only 
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on \H(Ii)\. The remaining state is the uniform superposition 



"2 



Now let Ai = Ai\Ii, A2 = A2 and A3 = A3UI1. Then we have P = P(ii\/i,i2) = ^(^1,^2), 
so we have constructed the correct state for the tripartition ^1,^2,^3. Clearly, if {i,j,k} is the 
unique 3-collision, then i £ Ai, j ^ A2 and k £ A3 with constant probability. It remains to consider 
whether i £ Ii. Although the distribution of Ii is non- uniform, the distribution restricted to those 
Ii with H(Ii) = h is uniform for any fixed h, and it is easy to see that Pr(z S Ii\H{Ii) = h) is o(l) 
for any h. 

For more details, refer to the proof of Theorem 15.41 which also proves an analogous statement 
for time complexity. □ 

Hereafter, we assume the above choice of partition Ai, A2, A3, and that if there is a unique 
3-collision k}, then i £ Ai, j £ A2 and k £ A3. 

Theorem 4.3 (Costs of the update walk S', -^(-^U' + C')). The update walk has query complex- 
ities S' = 0(si) and ^(^U' + C) = d{y/nm/si). 

Proof. Fix an arbitrary vertex 5*2 £ (^^^^J^^^)- We now analyze the update walk P^'^. The walk is 
still on J(2n/3 — 2s2, si), so 6' = ft{-^), but in contrast to |Amb04j . a vertex is considered marked if 
it has at least m collision pairs, and we have a lower bound of 712 = ^}{n) on the number of disjoint 
collision pairs. Since we defined m = s\n2/n'^ as roughly the expected number of collision pairs 
in a set of size si, we have e' = We still need to do the walk, both to amplify the success 

probability to inverse polynomial (which we could also have done by increasing si by log factors) 
and more importantly, to implement the phase flip |a;,0) |L'(a;,0)) 1— — |a;,0) \D[x,Qi)). 

Setup: We need S' = 0(si) queries to set up (2n/3-2s2^ J2g^^^(AiuA2)\x(S2)-^ \Q{Si)). 

Update: The update on J(2n/3 — 2s2,si) costs 0(1) queries. 

Checking: The query complexity of checking is 0, since we merely observe whether there are m 
colliding pairs in 5i. 



We can thus compute ^(^U' + C) = Oi^^V^) = 0{^^)- □ 
The following lemma tells us that we do not need to reverse the garbage part of the data. 

Lemma 4.4. For all edges (^2,5^), |^(52,5^)) = \iIj{S'2, S2)) . 

Proof. Recall the definition of |V'(5'2, 5*2)) from ([T|): 



In2-S2\ 
\ m ) 



qCsi 
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To see that this is symmetric in 5*2 and 5*2 , we need only show that | M^'^ \ = \ \ . We have 



r\S2\\ n{AiUA2)\I(S2)\-2m. 

m J \ si — 2m 



This holds because all collisions in Ai U A2 are disjoint, and so choosing m pairs from P \ 5*2 
gives 2m distinct indices. We can easily see that {V \ = n2 — S2, which is independent of 
82- Less trivially, since all collisions in S2 are disjoint, we have |X(52)| = 2s2 for all 5*2, and so 
1(^1 U A2) \X(52)| = 1(^1 U ^2)1 - 2s2, again, independent of S2. Thus we have {M'^^l = jM'^al, 
completing the proof. □ 

From this lemma it readily follows that the Garbage Swap requires no queries. 

Theorem 4.5 (Garbage Swap cost). No queries are needed to perform the Garbage Swap, 
which for any (52, ^2) performs the map 

\Q{S2),QiS'2)) \HS2,S'2)) ^ \QiS'2),Q{S2)) \HS'2,S2)) . 

The Local Diffusion with Garbage also requires no queries, but is nontrivial to implement. 

Theorem 4.6 (Local Diffusion with Garbage cost). No queries are needed to perform the 
Local Diffusion with Garbage, which, for any S2, performs the map 

\Q{S2))\^'-{M'-)f ^ \Q{S2),Q{S'2))W2,S'2)). 

V |r(62)| 

where \il){S2-, S'2)) is defined in (|7p. 

Proof. We employ the following procedure to perform the Local Diffusion with Garbage. 
L Perform \Q{S2),Q{Si)) ^ \Q{S2),Q{Si)) {ZV^T.ie{^2) \Q{I)). 

2. Perform \Q{S2),Q{Si)) ^ \Q{S2),Q{Si)) (I^J^^^I)"'^' Eje(^(Si)) \Q{J)). 

3. Perform \Q{Si)) \Q{J)) ^ |q(5i)) \Q{J)), where Si = 5i \X(J). 

Here / represents collision pairs to be removed from S2 and J represents collision pairs to be added. 
It is clear that each of these operations has query complexity 0. 

Now we show the correctness of this procedure. Recall that Si G M^'^ if and only if Si contains 
m collisions, i.e., 17^(51)1 > m, so \Q{S2)) \Tr^^{M^^))'^ is 

After performing the above procedure, we get the state 



\V{Si)\>m 
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Note that 'P{Si) = ViSi) U J. To see this, we must appeal to the fact that all collisions in 
Ai X A2 are disjoint, by assumption, so for each collision pair G J, removing i and j from 

5*1 removes the collision pair and no other collision pair from V{S\). Next, we can see that 
\V{Si) U J| = \V{Si)\ + m, since J n V{Si) = and | J| = m. Thus, \V{Si)\ = \V{Si)\ + m, and 
we can rewrite the state as 

\Q{S2))h) "^(""^-'A Y.\Q[I))\Q[J)) Yl a^^{S2,{S2UJ)\I)\QiSi) 



where 



as^iS2,{S2UJ)\I) 



/n2-S2\ 
\ m I 



We now simply note that the neighbours of any S2 ^ X are exactly (52 U J) \ I for I G (^) 
and J G (^]^^)- Furthermore, for such a neighbour 5*2 = {S2 U J) \ /, Q{S2),Q{I),Q{J) encodes 
Q{S2)-,Q{S2). Finally, for such an Sj, we have ^2 U J = S'2 U S'2. Thus, we are left with the desired 
state. □ 

Corollary 4.7 (Local Diffusion and Database Swap cost T). The family {P^^ , M^^ ,d^'^)s2ex 
implements the Local Diffusion and Database Swap of {P, Q) with no queries. 

Proof. This is immediate from Theorems 14.51 and 14.61 □ 

The checking cost is immediate, since we can use Grover search to look for an element of ^3 
that collides with any of the stored 2-collisions. 

Theorem 4.8 (Checking cost C). We can implement the checking reflection with C = 0{y/n) 
queries. 

We now have all necessary ingredients to prove the main theorem. 



Proof of Theorem \4-l\ We apply Theorem 13.31 to compute the cost of our nested-update quantum 
walk algorithm, giving (up to log factors) 

Sc + s' + 4.f4.f4.f4^u' + c'UTUc 



Vv^ Vv^ Vv^ 

+ + + (a/I + 



using the cost calculations from Theorems 14.21 14. 3^ and 14.81 and Corollary 14.71 Setting si = rc'l'^ 
and S2 = n'^l'^ gives query complexity 0(n^/''). □ 
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5 Time Complexity of 3-Distinctness 



In this section we prove the fohowing theorem. 

Theorem 5.1. The time complexity of 3-Distinctness is 0(n^/'^). 

This follows fairly straightforwardly from the quantum walk described in Section HI The only 
remaining task is to describe how we can encode the sets of queried indices and pairs of indices so 
that all necessary operations, such as inserting an element in a set or removing an element from a 
set, can be done in poly-logarithmic time. We use the same data structure that was used to obtain 
a tight upper bound on the time complexity of Element Distinctness |Amb04| . After describing the 
necessary properties of this data structure and how we apply it to our walk, we explain how each 
of the operations described in Section S] can be done time-efficiently using this encoding. 

The following lemma describes properties of the data structure that we use to encode edges of 
our walk and their data. We refer to this data structure as a skip-list. 

Lemma 5.2 ( [Amb04| ). There exists a data structure for storing a set of items of the form {z,x) 
(the X values need not be unique) that allows the following operations to be performed in worst case 
time complexity 0(log^(n -|- q)): insert an item; delete an item; look up an item by its x value; 
or create a superposition of the elements stored. The data structure storing a set S is a unique 
encoding of S. 

Encoding an Edge and its Data We now describe how to encode an edge and its data. 
These states have the form either {82, S'2, D{S2, S2)) , where \D{S2, S2)) is a superposition over 
basis states \QiS2), Q{S2),QiSi)) for {S2, S2) G E and Si C {Ai U A2) \T{S2 U S2) (and recall that 
Q{S2),Q{S2) automatically encodes 82,82); or |S'2, 0, L'(S'2, 0)), where |L'(52,0)) is a superposition 
over basis states \Qi82),Q{8i),Q{8[)) for ^2 G X, and {81, 8[) G E^^ U {V^^ x {0}). Strictly 
speaking, we previously defined |L'(S'2,0)) using \Tr^'^{M^^)'j^^s2 instead of \-K^^{M^'^)'j^s2^ ^^at is, 
as a superposition of \Q{82),Q{8i),0) for ^2 G X, and 81 G V^'^. However, we must also consider 
how to encode states of the nested update walk, which do not generally have in the coin register. 

We begin by encoding the triple of sets {Q{82), Q{8i), Q{8[)). We store each of Q{82) and Q{8i) 
in a skip-table. To store Q{8[) for 8'i 7^ 0, we simply store both Q{8i) \ Q{8'i) and Q{8[) \ Q{8i), 
each of which is a single queried index {i,Xi)- This already encodes the three sets, but we add 
additional structure to speed up certain tasks. We store Q{V{8i)), the set of 2-collisions in 5i, 
in another skip-table. We also keep a counter of the size of this set so that we can easily check 
whether 81 is marked. 

Now we describe how we encode the triple of sets {Q{82),Q{8'2),Q{8i)). We store each of Q{82) 
and Q{8i) in a skip-table. To store Q{82), we store each of Q{82) \ Q{82) and Q{82) \ Q{82) in a 
skip-table. We also store with Q{Si) an additional skip-table containing Q{T'{Si)), with a counter 
encoding its size. We store this simply because this is what is left over from the encoding of Q{8i) 
after we perform LOCAL DIFFUSION. 

It is now clear from Lemma [5.2l that we can perform the following operations in poly-logarithmic 
time: insert an element to ^i, insert an element to 5*2, delete an element from 5i, delete an element 
from ^2, look up an element in ^i, and look up an element in 5*2. In addition, we can perform each 
of these operations in superposition. 
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Cost Analysis We now analyze the cost of all operations implemented in Section SI 

Since we are now concerned with time complexity, we need some efficient way to store and 
compute the partition Ai, A2, A^. We use the following notion to efficiently represent a random 
subset of [n]. 

Definition 5.3. A family F of functions / : [n] — )• [£] is said to be k-wise independent if for any 
distinct ii,...,ik S [n], the distribution (/(ii), • • • , /(ifc)) identical to the uniform distribution 
over [£]''. 

When n = £ IS a prime power, a simple example of a /c-wise independent functions is the 
family of all polynomials of degree k — 1 over the finite field GF(n) |WC81j . Each such polynomial 
can be represented using 0(A;logn) bits and evaluated using 0{k) additions and multiplications 
over GF(n). More efficient constructions exist, but the polynomial construction suffices here. It 
can be extended to any integer n by allowing a small statistical distance from the distribution 
(/(ii), . . . , f{ik)) to the uniform distribution over [n]^ . 

For simplicity, we now assume that we have at our disposal a (perfect) 3-wise independent 
family F of functions / : [n] — )• [n] . 



Theorem 5.4 (Outer walk setup cost Sc). We can construct, in time 0(si + S2\/n/si), a state 
("2) Ss'2e(''''^i''*2)) \Q{'S2)) for Ai,A2 random variables such that 

1. |Ai| = |A2| = §; 

2. ^1,^2)^3 '■= [n] \ {Ai U A2) is a tripartition of [n]; 

3. if X has a unique 3-collision {i,j,k}, Pr(i G Ai,j G A2,k € ^3) = ^2(1); 
4- the space complexity of storing the partition {Ai, A2, A3) is 0(si); and 

5. the time complexity of determining to which of Ai, A2 or A3 an index i G [n] belongs is 0{1). 

Proof. This proof is similar to that of Theorem 14.2^ but we include significantly more detail. Let 
/ G F be a 3-wise independent function, and define Ai, A2 and ^3 by f{i) < ^ + si — S2 <^ i £ Ai, 
§ + si - S2 < f{i) < ^ + si - S2 i € A2, and /(i) > ^ + Si - S2 i e A3. Then ii, i2, ^3 is 
a partition of [n] with \Ai\ = ^ + si — S2, \A2\ = ^, and |j43| = ^ — si + S2. 

If X has a unique 3-collision k}, then the 3-wise independence of / implies that 

Pr (^i £ Ai,j e A2,k e A3 

fii) < 3 + Sl - S2, 3 + Sl - S2 < /(i) <^ + Sl-S2, f{k) > — + SI-S2 
1 1 



>^3-o(l)J ^--0(1). 

We assume that this holds when constructing the starting state. Otherwise, our construction fails 
and we try again. Let h2 '■= 1^(^1,^42)1. We can assume that h2 G il(n) for the same reason we 
can always assume that x has il(n) 2-collisions. 
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— 1/2 

To begin, we create a uniform superposition ('^^^') YIt Ml^ \Q(.I)) of sets of si indices drawn 



from Ai, stored in a skip-table, and query these indices. This uses si queries and si insertions, for 
a total time complexity of 0(si). 

For / C Ai, let H{I) := {j ^ A2 : 3i £ I^Xj = Xi}- Next, we search A2 for indices in H{I), 
assuming that H{I) has size at least Q{^^^^). The following lemma justifies this assumption. 

Lemma 5.5. Let I be a uniformly random subset of Ai of size si. Then Pr/(|i?(/)| < -^si) < o(l). 

Proof. The random variable \H{I)\ has a hypergeometric distribution with mean ^ = 2n/3+ll-s2 ~ 
Q{si). Using tail inequalities [Skalll eq. 14] we have, for any constant c > 1, 

Ft{\H{I)\ < i/x) < exp I -2 ( ^^^~ ^' 1 si I < e-®(^i) = o(l). □ 




Thus we can then restrict our attention to the part of the state with \H{I)\ > esi for some constant 
e < — , as this part has 1 — o(l) of the weight. We can then perform the mapping 

using S2 applications of Grover search for a new element of H{I). Each search requires 
^{^/n/\H{I)\) = d{^/n/si) iterations. As we find elements, we insert them into a skip-table, 
also separately recording the order in which we find the indices. To check if some i is in H{I), but 
not already found, we 

• look up i in the skip-table of indices already found; 

• query Xi', 

• compute f{i); and 

• look up Xi iu Qi^)- 

Each of these operations has time complexity 0(1), for a total cost of 0(1) per iteration. The total 
cost of the S2 rounds of search is 0{s2\/n/ si). Finally, we must uncompute the order in which we 
found the indices of J. For each J, we have the state \Q{J)) 2~''2/2 ^^^^^ \a{J)), where 5„ is the 

symmetric group on n symbols. We can uncompute the order register in cost 0{s2), completing 
the desired mapping. 

Let Ii be the elements of I for which we did not find a collision, and I2 those ele- 
ments of I for which we did find a collision. We can reversibly convert \Q{I),Q{J)) to 
\Q{h),{{'i',j,Xi) '■ i £ -^2i j G J^Xi = Xj})) where both sets are stored in a skip-table. We call the 
second set (5(5*2). To accomplish this mapping, we do the following S2 times, once for each j E J: 



look up Xj in Q{I) to find (i, Xi = Xj)] 
insert {i,j,Xi) into Q{S2); and 
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• delete {i,Xi) from / and {j,Xi) from J. 
What remains in Q{I) after performing these steps is exactly Q{Ii). Each repetition costs 0(1), for 
a total cost of 0{s2)- Note that we can delete {i,Xi) every time because all 2-collisions in Ai x A2 
are disjoint (by assumption), so = si — S2- We also have \H{Ii)\ = \H{I)\ — S2, again because 
all 2-collisions in Ai x A2 are disjoint. Thus, after performing the full mapping, the part of the 
state under consideration is 

Ct')'"^ E ("^"'i'^i'^iwo) E . 

Measuring the first register, containing some |Q(/i)), gives the state 

for some Ii with probability at least 1 — o(l). Adding up the total cost, we find Sc = 0(si + 
S2\/n/si + S2) = 0{si + S2\/n/si), since n > si. 

This state is the correct starting state for the partition ^1 = ^1 \ Ii,^2 = ^2,^3 = ^3 U /i, 
which is clearly a tripartition with = \A2\ = \A^\ = n/2>. Furthermore, for a 3-collision 
{i, j, fc}, assuming i £ Ai, j £ A2, k G A3 (which happens with constant probability), the only way 
we can fail to have i E Ai,j S A2,k E A3 is if i € /i. Although Ii is not uniformly distributed, 
Pr(/i|ff(Ii) = /i) is uniform for any h. Furthermore, Pr(z G /i|i?(/i) = h) = o(l) for any fixed /i, 
since <^ n. Thus, we have Vvax,A2,aA^ ^ ^iiJ £ ^2-,k G A3) = ri(l). 

Finally, to store the tripartition ^1,^2, A3, we need to keep /, as well as /i, which we store in 
a skip-table. This takes space 0(1) -|-0(|/i|) = 0(si). To compute which of Ai, A2, or A3 contains 
an index z, we first compute /(i), and then (possibly) look up i in Ii, each of which costs 0(1). □ 

Theorem 5.6 (Costs of the update walk S', -^(-^U'+C)). The update walk has time complexities 
S' = 0(5i) and ;^(^U' + C) = 0(7W^)- 

Proof. This follows from Theorem 14.31 and our encoding of a triple {Q{S2),Q{Si),Q{S'i)). The 
implementation is nearly identical to the time-efficient Element Distinctness algorithm of |Amb04] , 
except that we store an extra skip-table containing the set Q{V{Si)). However, insertion and 
deletion may still be performed in poly-logarithmic time. To insert i into 5i, we must look up Xi i^i 
Q{Si) to see if we have a new collision in Q{Si), i.e., if there is some {j,Xi) already in Q{Si) such 
that {i,j) £ (Ai X A2) U (A2 x Ai). If there is such a j, then we insert {i,j,Xi) iiito Q{V{Si)) if 
(i,j) G Ai X A2 or {j,i,Xi) iiito Q{V{Si)) else. Finally, we insert {i,Xi) iiito Q{Si). This involves 
a constant number of skip-table insertions and lookups, so its cost is still poly-logarithmic. We 
can delete some i G Si by running this operation in reverse. Thus, the update and setup cost are 
clearly the same as |Amb04] . From the proof of Theorem 14.31 have 

S' = 0(si) U' = 0(1) 6' = n{l/si) e' = 

To check if Si is marked, we simply read the counter storing the size of Q{V{Si)) and check if 
it is at least m, in time C = 0(1). Thus, we have 
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Since \ip{S2,S2)) = \^{S'2,S2)) for all edges (52,52) by Lemma we have the following. 
Theorem 5.7 (Garbage Swap cost). We can implement the Garbage Swap in time 0{m). 
Proof. The Garbage Swap is the operation that acts, for any edge (52,52), as 
|Q(52), Q(5^)> |V^(52, 5^)> ^ |Q(5^), Q(52)> [^'(^a, ^2)) • 

By Lemma Hm we need only consider the cost of |<3(52), <5(52)) ^ |Q(52), Q(52)). Recall that 
\Q{S2),Q{S'2)) is encoded as \Q{S2)) |Q(52 \ S'2)) |<5(52 \ 52)), with each of the three parts encoded 
as a skip-table. Since (52, 52) is an edge, we have |52 \ 52I = |52 \ 52| = m. Thus, we can perform 
the mapping to \Q{S'2)) |Q(52 \ 52)) |<3(52 \ 52)) by performing m insertions and m deletions on 
Q(52) to get Q{S'2). □ 

Theorem 5.8 (Local Diffusion with Garbage cost). We can implement the Local Diffu- 
sion WITH Garbage with time complexity 0{m). 

Proof. We consider each of the three steps from the proof of Theorem 14.61 In step 1, we create 
a superposition over sets of m values in Q[S2) using m superposition accesses to the skip-table 
storing (5(52). In step 2 we create a superposition over sets of m values in Q{V{Si)) using m 
superposition accesses to the skip-table storing Q(P(5i)). Finally, in step 3 we perform m lookups 
in the skip-table storing Q{J) = Q{S'2 \ S2) and 2m deletions from the skip-table storing Q(5i). 
The total cost of this is 0{m). 

It is also clear from the proof of Theorem 14.61 that we move from a superposition of correctly 
encoded triples ((5(52), Q(5i), 0) (where the corresponds to the coin of Si) to a superposition of 
correctly encoded triples {Q{S2),Q{S'2),Q{Si)). □ 

Corollary 5.9 (Local Diffusion and Database Swap cost T). The family {P^^ , M^^ ,d^^)s2£X 
implements the Local Diffusion and Database Swap of{P, Q) with time complexity T = 0{m). 

Proof. This is immediate from Theorems 15.71 and 15.81 □ 

Theorem 5.10 (Checking cost C). We can implement the checking reflection in time C = 0{y/n). 

Proof. To check if a vertex 52 is marked, we search for an index k £ A3 such that there exists 
{i,j) G 52 such that k} is a 3-collision. Each time we check if a particular k has this property, 
we query k and look up Xk in Q{S2) in time (9(1). □ 

Theorem 15.11 now follows. All costs are the same as their query complexities from Section U 
with the exception of T = 0{m) = 0{-^), but this does not change the asymptotic complexity. 
Plugging the values from Theorems 15.411^^ and lS.lUl and Corollarv 15.91 into the nested update cost 
expression from Theorem 13.31 we have 

Sc + s' + 4,(^(^(4ffU' + c)+j]+c 



which is still optimized by setting si = n^/^ and S2 = n^^'^ , giving time complexity 0(n^/^). 
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6 Conclusion and Future Directions 



We have shown that the quantum walk search framework of [ MNRSlT] can be extended to allow a 
data function that depends on both the vertex and the coin, provided certain costs are accounted for. 
This extension allows us to implement nested updates, although there may be other applications of 
this new framework, as it more generally allows us to consider updates with garbage resulting from 
any type of update subroutine. Nested updates provide another tool for quantum walk algorithms 
analogous to the nested checking of |JKM13j . and we hope that these tools will facilitate further 
upper bounds on both time and query complexity. 

k 

It remains an open problem to improve the 0(n'=+i) time complexity upper bound for k- 
Distinctness for k > 3. The 3-Distinctness upper bound of |Bell3| can be extended to a general 
^-Distinctness upper bound by coming up with an efficient procedure for constructing a starting 
state that generalizes our |7r)^ |Bell3j . Efficiently constructing this generalized starting state would 
be a necessary, but not sufficient, condition for generalizing our upper bound to k > 3. 
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